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1. Summary and discussion 
1.1. Summary 

In this subsection, we shall define a class of random variables (r.v.'s) and a class 
of generalized moment functions, for which a sharp probability inequality will 
be stated. This inequality is the main result of the paper. Originally, extension 
to a wider class of moment functions was not an objective of this study. Rather, 
the aim (suggested by statistical applications considered in [53]) was to obtain 
an optimal version of the von Bahr-Esseen (vBE) inequality [60] for the (abso- 
lute) power moments. However, it turned out that such an extension to general 
moment functions provided the only apparently available way to prove the best 
possible bound for the power moments. 

Given any sequence (£j)™ =1 of (real- valued) r.v.'s, let Xj := Sj — Sj-i denote 
the corresponding differences, for j G l,n, with the convention So := 0, so that 
X\ = Si] here and in what follows, for any m and n in the set {0, 1, . . . , oo} we 
let m, n stand for the set of all integers i such that m $J i n. 

If E\Xj\ < oo and E(Xj\Sj-i) = for all j G 2,n, let us say that the 
sequence (5j)™ =1 is a v-martingale (where "v" stands for "virtual"); in such a 
case, let us also say that (_Xj)™ =1 is a v-martingale difference sequence, or simply 
that the Xj's are v-martingale differences. Note that, for a general v-martingale 
difference sequence (_Xj)" =1 , X\ may be any r.v. whatsoever; in particular, its 
mean (if it exists) may or may not be 0. It is clear that any martingale (Sj)" =1 
is a v-martingale. Quite similarly one can define v-martingales with values in a 
normed space. 

Introduce the following class of generalized moment functions: 

Ji, 2 := {/ G C\R): /(0) - 0, / is even, 

/' is nondecreasing and concave on [0, oo)} 
= {/GC x (M): /(0) = 0, / is even, 

/" is nonnegative and nonincreasing on (0, oo)}; (1.1) 

here, as usual, C 1 (M) is the class of all continuously differentiable real- valued 
functions on M., and then /" denotes the right derivative on (0, oo) of /'; on 
(—00,0), /" will denote the left derivative of /'. It is clear that each function 
/ G .Fi^ is convex and hence nonnegative. Also, for each function / G J~i,2 one 
has /'(0) = 0. It follows that /' > on (0, oo) and hence / > on K \ {0} for 
any function / G Ti$ \ {0}. 

Theorem 1.1. 

(I) For any f G F\_2 \ {0}, n G 2, oo, and v-martingale (5j)™ =1 , 

n 

E^XE/^ + C^E/ft) (1.2) 

J=2 
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with C = Cf, where 

Cf.= sup %M (1 . 3 ) 

0<x<s<oo J{S) 

L fta (x) := f(x - s) - f(x) + sf(x). (1.4) 

(II) The constant factor Cf is the best possible in the sense that, for each 
f G J~i,2 \ {0} and each n 6 2, oo, the number Cf is the smallest value of 
C such that inequality (1.2) holds for all v-martingales (Sj)j" =1 ; in fact, 
Cf is the best possible even if the differences Xi, . . . ,X„ are assumed to 
be any independent zero-mean r.v.'s. 

(III) For each f € 7i, 2 \ {0}, 

1 < C) < 2. (1.5) 

(IV) For each C € [1,2] there is some f € Fi,2 \ {0} such that Cf = C; in 
particular, it follows that the bounds 1 and 2 on Cf in (1.5) are the best 
possible ones. 

Since all functions / in T\,2 are nonnegative, the expressions on both sides of 
inequality (1.2) are well defined. At that, it is possible for the right-hand side, or 
for both sides, of (1.2) to equal oo. In the case when the differences X\, . . . , X n 
are independent zero-mean r.v.'s, if the left-hand side of (1.2) is finite then (by 
Jensen's inequality) Ef(Xj) < oo for each j £ l,n, so that the right-hand side 
is finite as well; thus, for independent zero-mean X\, . . . ,X n , the two sides of 
(1.2) arc cither both finite or both infinite. 

A serious obstacle to overcome in order to obtain Theorem 1.1 was to un- 
derstand the form of the inequality to prove, including choosing the "right" 
class of moment functions and, especially, developing the conjecture on what 
the optimal constant factor Cf in the inequality can possibly be. 



1.2. Discussion 

In this subsection, we shall 

1. describe the structure of the class J-\^ as a convex cone, which will be 
useful in most of the proofs, and provide examples of functions in the class 
J 7 !^, including the (absolute) power functions and "extreme" functions 
(that is, functions belonging to the extreme rays of the convex cone T\p)\ 

2. present a general approach to effective calculation of the best possible con- 
stant C f , with further information on this constant for the power functions 
and "extreme" functions; 

3. give an application to the concentration of measure for separately Lipschitz 
functions on product spaces; 

4. state other corollaries of the main theorem and relate the results with the 
relevant ones in the literature, by von Bahr and Essccn (vBE) and other 
authors. 

Each of these items will be presented in a separate subsubsection. 
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1.2.1. Structure of the class Tx,2 and examples of functions in this class 

The following proposition describes the convex-cone structure of the class 7-1,2- 
Proposition 1.2. 

(I) A function f: R — > R belongs to the class Tx,2 if o,nd only if there exists 
a (nonnegative, possibly infinite) Borel measure 7 = 7/ on (0, 00] such 
that J^ Q , (t A 1)7( At) < 00 and 

f(x)= [ Mx)l(dt) (1.6) 

7(0, oo] 

for all x G R, where 

Mx) ■■= x 2 - (|x| - t) 2 + , 

assuming the conventions u + := V u, u p + := {u + ) p , u — 00 := —00, and 
(— oo) + := 0, for all real u, so that ip^x) = x 2 for all x G R. j41so, 

±Mx)^\x\ (1.7) 

uniformly in x G R. 

(77) for eac/i / G ^i,2) ^ e corresponding measure j — jf is unique and 
determined by the condition that 

7((x,oo])=|/"(x) (1.8) 

/or aZZ a; G (0, 00). 
(777,) for any / G 7-1.2 a^rf £ G [0, 00), 

/'(*)=/ *l>l(x)y(dt) = 2 [ (xAt)y(dt). (1.9) 
7(o, 00] J(o, 00] 

Proposition 1.2 will be used in the proofs of most of the other results of this 
paper. 

Note that the rays R+^t corresponding to the functions tpt (for t G (0, 00]) 
are precisely the extreme rays of the convex cone 7-"i,2, where R+/ := {A/: A G 
(0,oo)}, for any / G 7-i,2 \ {0}. This follows because the rays R+7,/j t = R+<5t 
(with t G (0,co]) are precisely the extreme rays of the corresponding convex 
cone {7/ : / G 7-1.2} of measures, where St stands for the Dirac measure at the 
point t. (A ray R+/ of a convex cone is called extreme if, for any nonzero fa 
and fa in the cone such that fa + fa = f, both fa and fa must lie on the ray.) 

Also, note that ip t (x) = .t 2 I{|x| < t} + (2t\x\ - t 2 )l{\x\ > t}, so that ip t {x) 
equals x 2 for small enough |x| and is asymptotic to 2t\x\ as |x| — > 00. Thus, the 
"extreme" function ip t is in a sense intermediate between the absolute powers 
I • I and I • | 2 . So, by (1.6), all functions / G inherit such a property. This 
should explain the choice of the notation T\^- 
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Classes of moment functions similar to J- 1.2 arise naturally in extremal prob- 
lems in probability and statistics; see e.g. [18, 59, 45, 21, 47, 48, 49, 6, 7, 8, 51, 
50, 43]; is especially similar to the class 02,3 considered in [21]. 

Let us now give some examples of functions / in .Fi a- The "extreme" func- 
tions tpt have been already mentioned. Perhaps the most important members of 
the class .7-1,2 are the power functions | • | p with p G (1,2]. The function | • | is 
not in -7-1,2, since it is not in C 1 (M). 

It is easy to construct many other kinds of examples of functions / G J~i,2 
by (i) letting /" be (on (0, 00)) any function, say <?, which is nonncgative, 
nonincrcasing, right-continuous, and integrable on any interval of the form (0, it], 
for any u G (0, 00); then (ii) finding / on [0, 00) as the solution to the following 
initial value problem: /(0) = f'(0) = and /" = g on (0,oo); and finally (hi) 
extending / to the entire real line R as an even function. 

E.g., taking g{x) = (1 + x) p ~ 2 for p G (1,2) and x G (0,oo), one ends up 
with f(x) = p (J_^ [(1 + \ x \) p — 1 — p\x\] f° r a U 2; G M, which is asymptotic 
to I x 2 as x — > and to p (J_x^ \x\ p as |ar| — > 00; if the condition p G (1,2) is 
replaced here by p G (— 00, 0)U(0, 1), then f(x) is asymptotic to J^L as \ x \ qq_ 
Similarly one can get f(x) = e - ' 1 ' — 1 + \x\ (by starting with g(x) = e~ x for 
x G (0, 00)) ; f{x) = \x\ - ln(l + \x\) (with g(x) = ; f(x) = \x\ ln(l + \x\) 

(with 5 (x) = + -fj^). 

Perhaps a more interesting example is the following family of functions, which 
are parabolic splines (and will also be used in Remark 1.5): 

2(Xj + l) 2 / 3 ^ (xfc + l) 2 / 3 



if xj |ar| < aCj--|_i and j G 0,oo, where xo := 0, x\ is any positive real number, 
and Xj := q 2J 1 —1 for q := xi + 1 and all j G 2, 00, so that Xj + i + l = (xj + l) 2 for 
all j = 1,2,... (we use the standard conventions a fcC := a^ c ' and J2k=o ••• := 0). 

It is easy to check that f a \ t G 7i,2 and /' a ' lt (a;) = (xj + l) -2 / 3 = (^- +1 + 1)~ 1/3 
if Xj ^ |a;| < Xj+i and j G 0, 00, so that the function /" lt alternates between the 
powers (| ■ I + l) -2 / 3 and (| ■ | + l) -1 / 3 , as shown in the left panel of Figure 1. So, 
one might expect that the function / a i t alternates (far away from 0) between 
something like the powers | • |" 2 /3+2 = | . |4/3 and | . |-i/3+2 = | . |5/3_ Thig 

expectation is only partially justified. 

Indeed, introduce the (instantaneous) "effective" exponent of the function 
/ a it at a point x G R \ {0} by the formula 

p c tt(x) := log N /ait(x), so that / a it(x) = \x\ p " !!( - x) . 

The following proposition shows that the effective exponent p e g eventually, "in 
the limit", alternates between | (rather than the expected |) and |. In this 
sense, one might say that / alt stays closer to (| • | + 1)~ 1//3 than to (| • | + l) -2 / 3 , 
"most of the time" . 
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Proposition 1.3. 

(i) p e $(x) = p eS (p(x)) +o(l) as x oo, where p e g(r) := (2- ^) V (1+ ^) 

and jo(a:) := log 9 (x + 1) /or a; <G (xj,Xj + i]. 
(ii) For each j G 1, oo, i/ie function p increases from 1 to 2 on the interval 

(Hi) For each j G l,oo, i/ie approximate effective exponent p e e(p(x)) decreases 
from | to | and i/ien increases back to | as x + 1 increases from Xj + 1 
to (xj + l) 4 / 3 and f/ien on to Xj+i + 1 = (xj + l) 2 , respectively. 

Part of the graph of the (exact) effective exponent p c g (with xi = ^-) is 
shown in the right panel of Figure 1. Recall that the Xj's grow very fast in j for 
large j. Therefore, for better presentation, the horizontal axis in the right panel 
is nonlinearly rescaled so that the Xj's appear equally spaced. Namely, what is 
actually shown here is part of the graph { ( log 2 log g (x + l),p e s(x)) : x > xi}; 
note that log 2 \og q (xj + 1) = j — 1 for all j G 1, oo. 

/"« p C ffM 
l h 

_ 5/3 



3/2 



1 — ' — 1 1 ' x 

Xj -V3 X A X 5 

Fig 1. Left panel: f" (solid) for f = / a i t alternates between (| ■ | + 1)~ 2 / 3 (dotted) and 
(| • | + 1) — 1 / 3 (dotted). Right panel: the effective exponent p e g (solid) for f = / a i t eventually 
alternates between ^ (dotted) and ^ (dotted). 



1.2.2. On the best possible constant Cf in general and, in particular, for the 
power and extreme functions 

The following proposition concerns some general properties of the constant fac- 
tor Cf for nonzero / in except for / = ij)oo] in the latter, trivial case, one 
has Cf = 1, as also stated in Proposition 1.6; recall that ipoo(x) = x 2 for all 
x G E. 

Proposition 1.4. Take any f G J-1.2 \ {O,"0oo}- Let Sf := infsupp7, where 
supp7 stands for the support of the measure 7 = 7/ defined in Proposition 1.2. 
Recall the definition of Lf :s (x) in (1.4). Then the following statements hold. 

(1) Sf € [0,oo). 

(ii) For any s G (0, Sf], one has Lf- S (x) = f{s) for all x G (0, s). 
(Hi) For any s G (s/,00), one has Lj. s (0+) > and L'f. s (s—) < 0. 
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(iv) For any s G (0, oo), there is some (not necessarily unique) Xf. s G (0, s) 
such that Lf- S (x) is nondecreasing in x G (0,x/ ;s ] and nonincreasing in 

X e [Xf; S ,S). 

(v) One has 

C f = sup se(s/i0o) [j^y max xe(0)S) L f . <s (x)] 
= sup se(s/i0o) [j^L f . s (xf. s )] > 1. 

Remark 1.5. Proposition 1.4 provides for an effective maximization of Lf. 8 (x) in 
£ G (0, s), for any given s G (0, oo), so that £/(s) := j^y max xe ( ,s) Lf. s (x) = 
Lf. s (xf :s ) can be effectively found. In the important special case when / is 

a power function | • \ p (with p G (1, 2]), one can also use the homogeneity of / 
in order to compute the constant Cf quite effectively, as described in Proposi- 
tion 1.8. However, in general it remains to maximize Cf(s) in s G (sf, oo). It 
appears that usually Cf{s) is monotonically nondecreasing in s, if the function 
/ is not too irregular; one "exceptional" function / for which Cf lacks such a 
monotonicity property is a function / a it of the "alternating" family described by 
formula (1.10). Indeed, take / = / a it with x\ = ^. Then, using the Mathematica 
command Maximize, one finds that £(y^) < £(^j|). One may still ask whether 
it is true for all / G that the limit Cf(oo— ) exists, and if so, whether it is 
true that C/(s) ^ £/(oo— ) for all s G (s/,00), so that Cf be found as £f(oo—). 
In any case, Theorem 1.1 reduces the problem of finding the optimal constant 
C in (1.2) to a maximization just in two real variables, s and x, which should 
not usually be numerically too difficult. 

Now let us provide a simple description of the constant Cf in the case when 
/ is an "extreme" function ipt , representing the extreme rays of the convex cone 

Proposition 1.6. One has G^ t = 2 for each t G (0, oo), whereas C^^ = 1. 

Remark 1.7. Proposition 1.6 might seem quite surprising: whereas, by Theo- 
rem 1.1, the range of the values of Cf over all nonzero / in the convex cone J-i^ 
is the entire interval [1,2], the only value that Cf takes on all the extreme rays 
R+V't (which span the cone J-1.2 in the sense of (1.6)) is 2. This suggests strong 
nonlinearity of the optimal constant factor Cf in /. However, as seen from the 
proof of Proposition 1.6, the fact that Cp t is the same for all t G (0, 00) is due 
to a simple homogeneity property. Note also the discontinuity of C^ t in t at 
t = 00. 

As mentioned earlier, for any p G (1,2] the power function | • \ p belongs to 
the class ^1,2; for such p, consider the corresponding constant factor 

C p := C|.|p, 
so that for any v- martingale (Sj)" =1 

n 

E\S n \"< t E\X l \P + C P Y t ^W- (1-H) 

i=2 
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Note that | ■ | 2 = ipoo, so that, by Proposition 1.6, 

C 2 = 1. (1.12) 

Proposition 1.8. 

(i) For any p G (1, 2) 

C„ = £(p,Xp) = max l(p,x), 
xe(o,i) 

where 

£{p,x) :=L M „ ;1 (:r) = (1 - x) p - x p + px p ~ x (1.13) 
for x G (0, 1), and x p is the only root x G (0, 1) of the equation 

(l-xf 1 +x p ~ 1 = (p-l)x p - 2 . (1.14) 

Moreover, £(p,x) is increasing in x G (0,x p ) and decreasing in x G (x p , 1), 
for each p G (1,2). 
(^ In /act, x p G (2fi, ti) C (0, |) /or allpG (1,2). 

fra) Further, C p is continuously (and strictly) decreasing in p G (1,2] from 

Ci+ = 2 to C*2 = 1; furthermore, C p is real-analytic in p G (1, 2). 
(zvj XTie values C p are algebraic for all rational p G (1, 2]; in particular, C3/2 = 

^1 + 75 = 1-306... (withx m = l(2-V2) = 0.146.. J. 

(v) Explicit upper and lower bounds on C p are given by the inequalities 

C~ A V C*-' 2 < C p < C-+' 1 A Cp 2 Cp 2 < W p (1.15) 

for all p G (1, 2), where 

C;' 1 ■.= 2 - p ((3- P ) p + (p-l) p - l (j>+l)), 
C-- 2 :=5- p ((6-p) p + (p- ir- 1 (4p+ 1)), 

:=sfey((P- 1 ) P_1 ( 1 50+ 181^-152^ + 21^) 

+ (3 - p) p " 1 (450 - 381p + 152p 2 - 21p 3 )) , 

Cp' 2 : =8& ( 4 (^ ~ ^"'O 12 - 35 p + 9 V - 21 p 3 ) 

+ (6 - p) p ~ 1 {288 - I5p - 94p 2 + 21p 3 )) , 

W p :=2 2 - p . 

The upper bound W p on C p is exact at the endpoints of the interval (1, 2) 
in the sense that Ci+ = Wi+ and C2 = C2- = W2- = W2; each of the 
bounds C"' 1 , C~' 2 , C^ 1 , and C p ' 2 is also exact in the similar sense. 
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Fig 2. The ratios of Cp (black), C v 
(red), Cp' 2 (orange), Cp' 1 (green), Cp 
(blue), and W p (magenta) to 2 2 ~ p . 



The graphs of the ratios of C p , C p ' 1 , 
C-' 2 , Cp 1 , C+' 2 , and W p to W p = 2 2 ~p 
arc shown in Figure 2. The graph of C p , in 
comparison with W p and the von Bahr- 



C 



vBE 



presented 



(and any r.v.'s X±, . . . , X n ) with C 
discontinuity of C p at p = 1, namely, C\+ 



Essccn constant 
Figure 4. 

As mentioned in Subsubsection 1.2.1, 
the absolute- value function | • | is is not in 
the class T\,2- However, by (1.7), f - | is 
in the closure of T\,2 with respect to the 
uniform convergence on R. It is also clear 
that inequality (1.2) holds for / = | ■ | 
C\ := 1. From this viewpoint, there is a 
2 ^ 1 = C x . 



1.2.3. Application: concentration inequalities for separately Lipschitz functions 
on product spaces 

Let X\, . . . , X n be independent r.v.'s with values in measurable spaces 3£i, . . . , Xn 
respectively. Let g: *}3 — > R be a measurable function on the product space 
*P := Xi x • • • x X„. Let us say (cf. [9, 50]) that g is separately Lipschitz if it 
satisfies a Lipschitz type condition in each of its arguments: 

\g(xi, . . . ,Xi-i,Xi,Xi+i, ...,x n ) — g{xi,. . . ,x n )\ < pi(x t ,Xi) (1-16) 

for some measurable functions pi : X^ x Xj — > R and all i 6 l,n, (x±, . . . , x n ) e 
and Xi G Xj. 

Take now any separately Lipschitz function g and let 

Y :=g(X u ...,X n ). 
Suppose that the r.v. Y has a finite mean. Then one has the following. 
Corollary 1.9. For each i £ l,n, take any Xi € Xj. 

(I) For any f £ \ {0} 

n 

E /(y) < f(EY) + K f C f E f(pi{X hXi )) , (1.17) 

where 

K f :=sup{ ^fc^ : S g(0,oo), ce(0,|), oe(0,c)}e[l,2], 
L(7/(c, s, a) 4 J 

(1.18) 

C//(c, s, a) := c/(s - c + a) + (s - c)/(a - c) (1.19) 

(t/ie above definition of Kf is correct, because f > on R\ {0} and hence 
Uf(c, s, a) > /or any s £ (0, oo), c <G (0, |), and a € (0, c)) . 
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(II) For any p£ (1,2] 

n 

E\Y\p sC \ EY\p + k p C P Y / ^\MX l ,x l )\ p , (1.20) 

i=l 

where 

k p :=ku p = max [(c"" 1 + (1 - cf" 1 ) (c^t + (1 _ c)^)^ 1 ] . (1.21) 
1 1 ce[o,i/2] L v 

Moreover, k p continuously and strictly decreases in p £ (1,2] /rom 2 
1. Furthermore, the values of k p are algebraic for all rational p £ (1,2]; 

in particular, k 3 / 2 = § y51 + 21\/7 = 1.14..., corresponding to c — 

i(3- v 7 ! + 2-s/7) = 0.081... m (1.21). 77ie <?rap/i o/ Kip IS shown in 
Figure 3. 




Fig 3. hp, solid; 1, dotted. 



One can observe some similarity be- 
tween Cf,C p and Kf,k p . 

Thus, going from the "one-dimensional'' 
inequality (1.2) or (1.11) for v-martingales 
to the "multi-dimensional" measure con- 
centration inequality (1-17) or (1.20) en- 
tails an extra factor, k/ or k p , whose val- 
ues are between 1 and 2. 

The proof of Corollary 1.9 is partly 
based on the following proposition, 
which may be of independent inter- 
est. 



Proposition 1.10. For any zero-mean r.v. X , f £ T\j. \ {0}, and a £ K 

Ef(X)^ K Ef(X + a) (1.22) 

with K= Kf, and Kf is the best possible constant k in (1.22). 

In turn, the proof of Proposition 1.10 uses 

Proposition 1.11. Take any f £ \ {0}, s £ (0,oo), and c £ (0, §). Then 
Uf(c,s,a) (defined in (1.19),) is convex in a £ 1. Moreover, Uf(c,s,a) attains 
its minimum over all a £ R at a unique point a/ ;CiS £ [0,c). In particular, for 
all t £ (0,oo), s £ (0,oo), and c £ (0, §) 

a to = ^- c (s-c-t)+ (1.23) 

and K^, t = 2. 

On the other hand, Proposition 1.11 obviously complements Corollary 1.9. 
A difficulty in proving the uniqueness of the minimizer of Uf(c, s, a) in a in 
Proposition 1.11 is that, in general, Uf(c,s,a) is not strictly convex in a. 
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An example of separately Lipschitz functions g : X™ — > K is given by the 
formula g(x\, . . . , x n ) = \\x± + • • • + x n \\ for all x\, . . . , x n in a separable Banach 
space (X, || • j|). In this case, one may take pi(xi,Xi) = \\xi — Xi\\. Thus, one 
obtains 

Corollary 1.12. Let X\, . . . ,X n be independent random vectors in the Banach 
space (X, || • ||). Let S n := X\ + ■ ■ ■ + X n . For each i £ l,n, take any Xi G X. 
Then for any f € T\^. \ {0} 

n 

E/(||S„||)</(E||S n ||) + « / C / X;E/(||X i - a;i ||). (1.24) 

i=i 

Moreover, for any p € (1, 2] 

n 

E\\S n \Y> < {E\\S n \\Y + K p C P Y J ^\\Xi-x i f. (1.25) 

i=l 

For p = 2, inequality (1.25) was obtained in [54, Theorem 4], based on an 
improvement the method of Yurinskii(1974) [27]; cf. [36, 37, 9], [50, Section 4], 
and [46, Proposition 2.5]. The proof of Corollary 1.9 is based in part on the 
same kind of improvement. 

As can be seen from that proof, both Corollaries 1.9 and 1.12 will hold even 
if the separately-Lipschitz condition (1.16) is relaxed to 

I Eg(xi, . . .,Xi-i,Xi,X i+ x,. . .,X n ) - Eg(a;i, . . . , Xi,X i+1 , . . . ,X n )\ < p l (i i , x,). 

(1.26) 

Note also that in Corollaries 1.9 and 1.12 the r.v.'s do not have to be zero- 
mean, or even to have any definable mean; at that, the arbitrarily chosen Xj's 
may act as the centers, in some sense, of the distributions of the corresponding 

Xi'B. 

Clearly, the separate-Lipschitz (sep-Lip) condition (1.16) is easier to check 
than a joint-Lipschitz one. Also, sep-Lip (especially in the relaxed form (1.26)) is 
more generally applicable. On the other hand, when a joint-Lipschitz condition 
is satisfied, one can generally obtain better bounds. Literature on the concen- 
tration of measure phenomenon, almost all of it for joint-Lipschitz settings, is 
vast; let us mention here only [31, 29, 28, 10, 30]. 

1.2. 4- Other corollaries of Theorem 1.1 and comparisons with known results 

Take any p e (1, 2]. A normcd space (X, || • ||) (or, briefly, X) is called p-uniformly 
smooth [1] if for some constant D £ (0, oo) (referred to as a p-uniform smooth- 
ness constant of X) and all x and y in X one has -| (||x + y|| p + ||x — yj| p ) ^ 
||ie|| p + DP\\y\\P or, equivalently, 

E\\x + Xy\\P ^\\x\\P + DPE\Xf\\ y \\P (1.27) 

for all symmetric (ally distributed) real- valued r.v. X . If X is p- uniformly smooth 
with a p-uniform smoothness constant D, let us say that X is (p, £))-uniformly 
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smooth or, simply, (p, D)-smooth. For instance, for any q <= [2,oo) the space 
L q (fj,) is (2, Z?)-smooth with D = yjq — 1, which is the best possible constant of 
the 2-uniform smoothness as long as the space L q (fj,) is at least two-dimensional 
— see [46, Proposition 2.1], [1, Proposition 3], [17, Corollary 2.8]. 

Dual to the notion of (p, D)-uniform smoothness is that of (<?, D _1 )-uniform 
convexity, whose definition can be obtained by reversing the inequality sign in 
(1.27) and replacing there p and D by q and I? -1 , respectively; here, - + - = 1. 
In particular, a result due to Ball, Carlen, and Lieb [1, Lemma 5] is that X 
is (p, D)-uniformly smooth iff its dual X* is (g, Z? _1 )-uniformly convex; cf. e.g. 
[16, 32]. Note that q- uniform convexity and p- uniform smoothness are refine- 
ments of the notions of uniform convexity and uniform smoothness, which go 
back to Clarkson [12] and Day [16]; cf. [25, 62]. These notions are important in 
functional analysis. In particular, Pisier [55] showed that every super-reflexive 
space is g-uniformly convex and p-uniformly smooth for some q and some p; an 
earlier result due to Enflo [19] stated that X is super-reflexive iff it is isomorphic 
to a uniformly convex space. Among many other results, Pisier [55] also showed 
that the super-rcflcxivity is equivalent to the super-Radon-Nikodym property. 
Applications of the 2-uniform convexity/2-uniform smoothness to Finsler man- 
ifolds were given by Ohta [41]. 

It is clear that X is (p, D)-smooth iff inequality (1.2) with C = D p and 
/ = || • || p holds for all martingales (or even v-martingales) (5j)" =1 with values in 
X and conditionally symmetric differences X^, ■ ■ ■ ,X n ; by symmetrization, the 
same inequality will then hold without the conditional symmetry restriction, but 
with the worse constant C = (2D) P instead of C = D p . These considerations 
suggest the following. 

Let us say that the space X is completely (p, D)-smooth if inequality (1.27) 
holds for all zero-mean real- valued r.v.'s X (and all x and y in X). It is clear that 
X is completely (p, D)-smooth iff inequality (1.2) with C = D p and / = || • | ?J 
holds for all martingales (or even v-martingales) (Sj)™ =1 with values in X. Also, 
Proposition 1.8 immediately implies 

Corollary 1.13. Take any p <= (1,2] and any measure fi on any measurable 
space. Then the space L p (fj.) is completely (p, D)-smooth with the best possible 
constant D = Cp^ p . So, for any n £ 2, oo and v-martingale (Sj) n ' =1 with values 
in LP(fi), 

n 

E||5„||^E||A 1 || p + C' p ^E||A J || p (1.28) 

i=2 

(cf. (1.25);. 

The above discussion suggests that the form of inequality (1.2) is rather 
natural in such contexts as concentration of measure, uniform smoothness, and 
martingales (or v-martingales). Yet, in the case when the differences X\, . . . , X n 
are independent real- valued zero- mean r.v.'s, the form of the following immedi- 
ate corollary of Theorem 1.1 may be more relevant. 
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Corollary 1.14. For any f £ T\ 2 \ {0}, 11 6 2, oo ; and (real-valued) v-martin- 
gale {S^ =1 , 

n 

EfiSnXK^EfiXj) (1.29) 

with K = Cf. 

However, in inequality (1.29) the constant factor K = Cf is no longer the 
best possible one, at least for independent zero-mean Xj's. One way to reduce 
the constant is as follows. In the conditions of Corollary 1.14, rewrite the right- 
hand side of (1.2) with C = C f as C f £" =1 Ef(X 3 ) - (C f - 1) Ef(X x ). Then, 
assuming that Ef(Xi) ^ ^ Y^j=i Ef(Xj) for some A £ (0, 00), one sees that 
the constant factor K = Cf in (1.29) can be reduced by spreading the "excess" 
Cf — 1 over all the summands Ef(Xi), . . . , Ef(X n ), to get (1.29) with 

K = Cf-$(C f -l)^C f . (1.30) 

To develop this simple observation a bit further, let us take any A € (0, 00) 
and say that a sequence {Sj)"—^ is a X-good rearranged-v-martingale if there are 
(i) some i £ 1, n such that Ef(Xi) ^ ^ X)j=i Ef(Xj) and (ii) a permutation 
(iij ■ • ■ ,jn-i) of the set l,n\{i} such that (Xi,Xj i: . . . ,Xj n _ ± ) is the difference 
sequence of a v-martingale. Note that, if the differences X\ , . . . , X n of a sequence 
(<Sj-)™ =1 are independent zero- mean r.v.'s, then (5j)" =1 is a 1-good rearranged- 
v-martingale. (In general, a A-good rearranged-v-martingale does not have to 
be a v-martingale.) Thus, one obtains 

Corollary 1.15. For any f £ J 7 !^ \ {0}, n £ 2, 00, and X-good rearranged-v- 
martingale (£•,•)"_!, inequality (1.29) holds, again with K as in (1.30). 

In the special case of the power functions | • \ p (with p £ (1,2)) in place of 
general / £ \ {0}, an inequality of the form (1.29) was obtained by von 
Bahr and Esseen (vBE) [60]: 

n 

EI^P^i^El^f, (1.31) 

with the constant factor K = 2 — — = 2 — —(2 — 1), which, by part (hi) of 
Proposition 1.8, is greater than the K in (1.30), again for / = | • \ p with p £ (1,2). 
The vBE inequality (1.31) has been used in various kinds of studies, see e.g. 
[4, 3, 57, 39, 35, 23, 38, 5, 13, 26, 20, 2, 56], among the more recent articles. 

As noted by vBE [60], the special case of inequality (1-31) (with K = 1) when 
the conditional distributions of the differences Xi given Si-i are symmetric for 
all i £ 2, n easily follows from Clarkson's inequality [12] 

\x + y\* + \x-y\*^2\x\* + 2\y\* (1.32) 

for all real x and y and all p £ [1, 2]. (As pointed out in [12], inequality (1.32) 
obviously implies that LP is uniformly smooth, and in fact p- uniformly smooth.) 
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Actually, it is easy to see that Clarkson's inequality (1.32) is equivalent to the 
symmetric case of (1.31), with K = 1. 

As mentioned in [60], an inequality of the form (1.31) is not of optimal order 
in n for independent identically distributed real- valued zero- mean X^s and may 
be used together with a Holder bound such as E |5 n | p ^ (E S^) p / 2 . Using simi- 
lar considerations together with symmctrization and truncation, Manstavichyus 
[34] obtained bounds on E | | p from above and below, which differ from each 
other by an (unspecified) factor depending only on p. The proof of Theorem 1.1 
(and especially that of part (II) of Lemma 2.5) shows that near-extremal r.v.'s 
X\, . . . , X n , for which the constant C in (1.2) cannot be non- negligibly less than 
Cf, are as follows: X\ and X2 arc independent, zero-mean, and both highly 
skewed in the same direction (both to the right or both to the left); [XjI is 
much smaller than |Xi|; and X 3 , . . . ,X„ are zero or nearly so. This suggests 
that the inequality (1.31) should be most useful for independent real-valued 
zero- mean X,'s when the distributions of the AVs are quite different from one 
another and/or highly skewed and/or heavy-tailed. 

Again in the case when the differences Xi , . . . , X n are independent zero- 
mean r.v.'s, von Bahr and Esseen [60] made an effort to improve their constant 
K = 2 — — in (1.31). For such Xj's and the values of p in a left neighborhood of 
2 such that D{p) := ^|f?r(p)sin^ = £ (^ ) 2_J T(p) sin ^ < 1, they showed 

that (1.31) holds with K = C,T BE := 7 — 5— , assuming the convention i := 

p (l-D( P )) + b 

00; in fact, the constant factor C^ BE may improve on (i.e., may be less than) the 

factor 2 — i only for values of p in a left neighborhood of 2 such that Dip) < |. 

It is stated (without proof) in [60] that Dip) decreases in p G (1, 2) and that the 

mentioned left neighborhood contains the interval [1.6,2]; cf. Figure 4, where 

the von Bahr-Esseen constant factor 2 A C p BE is compared with the optimal 

(for (1.2)) constant factor C p . (There are a couple of typos in [60]: in [60, (11)], 

one should have r(2.6) r instead of (r2.6) r , and also the expression [60, (12)] for 

Dip) should have tt(2.6) 7 ' instead of (7r2.6) r .) 

^ The method of [60] is based on a repre- 

1 sentation of the absolute moment E | J 5 ^ | p 

\ of a r.v. A as a certain integral transform 

^^^^-^^^ \ of the Fourier transform of the distribu- 

^~~ ::::: ===~1>~ tion of X. More general representations, 

l 1 for the positive-part moments E A . were 

obtained in [11, 44]. 

Take now again any p e (1,2]. Woy- 

q , czynski [61] considered the class Q p -\ of 

1 2 Banach spaces X defined by the following 

condition: there exist a map G : X — > X* 

that for all x and y m X one has (1) 
||G(x)|| = ||.t||p-\ (ii) G(x)x = \\x\\p, and (hi) ||G(a?) - G(y)\\ ^ A\\x - y\\P-\ 
The class Gi was introduced by Fortet and Mourier [22]. Hoffmann- J0rgensen 
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[25] proved that X € Q p -\ iff X is p-uniformly smooth. 

Woyczyhski [61] showed that inequality (1.31) holds for any independent 
zero- mean random vectors X\, . . . , X n in any Banach space X £ G P -i, with | • | 
and K replaced by || • || and A P: %. As noted in [61], the space LP is in G P -i, with 
the constant A = 2; at that, one should take G{x) = x 1 ^ 1 ^ := |x| p ^ 1 signa; G 

= (L p )* for all x E L p . It is not hard to see that the best possible constant 
A = A pX for X = LP is 

1 — 

W p := sup — =2 2 ~ p , 

«e(-i,i) (1 - w) p 1 

which is in agreement with the definition of Wp in part (v) of Proposition 1.8. 
Thus, one has (1.31) with K = W p = 2 2 ~ p for independent zero-mean differences 
X\, . . . , X n , which may be either real- valued or, equivalcntly, with values in L p 
(in which case | • | is replaced by || • |j p ). The constant K = W p in (1.31) is not 
the best possible one, even for independent zero- mean real- valued X\ , . . . , X n , 
even if n is not fixed; indeed, by part (v) of Proposition 1.8, W p > C p . On the 
other hand, the following proposition takes place. 

Proposition 1.16. One has Cp BE > W p for all pe [1,2). 

So, Cp BE > W p > C p for all p <G (1,2). This comparison is illustrated in 
Figure 4. 



2. Proofs 

This section consists of four subsections. In Subsection 2.1, we shall prove 5 
propositions, of the 8 ones stated in Section 1; three of these 5 propositions will 
be used in the proof of Theorem 1.1, in Subsection 2.4. The proof of Proposi- 
tion 1.8 (which is also used in the proof of Theorem 1.1) is more involved than 
those of the other propositions, and it will be presented separately, in Subsec- 
tion 2.2. Corollary 1.9 and the related Propositions 1.10 and 1.11 will be proved 
in Subsection 2.3. 

The main difficulty in proving Theorem 1.1 (in distinction with discovering 
it) is that Cf is not an absolute constant, as it depends on /; moreover, being 
the supremum of a family of ratios of linear forms in /, the factor C/ is nonlinear 
in / (recall here also Remark 1.7), and thus the right-hand side of the inequality 
(1.2) is "even more" nonlinear. Besides, the dependence of Cf on / can in general 
be described only implicitly. Recall also that the class J-'x^ of moment functions 
is an extension of the family of power functions with a small exponent p (in the 
interval (1, 2]) , which results in rather poor differentiability/convexity properties 
of the moment functions; cf. e.g. Haagerup's case [24, 40, 42] of p € (0, 2) U (2, 3) 
for the optimal Khinchin-type upper and lower bounds. 

However, there are some favorable circumstances. First, while linearity in 
/ is lacking, the linearity of both sides of inequality (1.2) with respect to the 
distribution of the v-martingale is there, and therefore it turns out to be pos- 
sible to reduce inequality (1.2) to a comparison between two quadratic forms 
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in /, which in turn can be reduced to a comparison of the coefficients of these 
quadratic forms, as displayed in formulas (2.22), (2.24), (2.25), and (2.26). The 
latter comparison is already between two piecewise algebraic (even piecewise 
polynomial) expressions, due to the piecewise polynomial nature of the extreme 
functions ipt- The polynomials are "4 x 4": each in 4 variables, of degree up 
to 4 in each variable. Some of these polynomials are not quite easy to analyze 
by hand. Moreover, the same piecewise nature of the functions ip t results in a 
very large numbers of cases to consider. After all the reductive steps described 
above, as well as a number of others, one still has to consider 432 such 4x4 
polynomials. While the work on each of these polynomials seems rather routine, 
it is a huge amount of symbolic calculations, and a good amount of numerical 
calculations too. In such a situation, it appears reasonable to use a computer. 

A well-known result by Tarski [58, 14, 33, 15] (rooted in Sturm's theorem) 
implies that systems of algebraic equations / inequalities can be solved in a com- 
pletely algorithmic manner. Similar results hold for algebraic-hyperbolic poly- 
nomials (that is, polynomials in x, e x , e~ x ) — as well as for certain other expres- 
sions involving inverse-trigonometric and inverse-hyperbolic functions (including 
the logarithmic function), whose derivatives arc algebraic; this latter fact will be 
implicitly used in proof of Proposition 1.8. Such algorithms arc implemented in 
Mathcmatica via Reduce and other related commands. In particular, command 

Reduce [condl kk cond2 kk {varl,var2,. . . ,}, Reals] 

returns a simplified form of the given system (of equations and/or inequalities) 
condl, cond2, . . . over real variables varl, var2, .... However, the execution of 
such a command may take a very long time (and/or require too much computer 
memory) if the given system is more than a little complicated, as is e.g. the case 
with the system (2.26)-(2.27) in Subsection 2.4, even after it is reduced to the 
432 simpler problems, with the 432 polynomials. Hence, Mathcmatica will need 
some human help here. To keep the expressions manageable, it is important to 
try to simplify the expression at each step of the calculation; this can be done e.g. 
using Mathematica commands of the form Assuming [cond, #//Simplif y] &. It 
appears that all such calculations done with a computer are, at least, as reliable 
and rigorous as the same calculations done by hand. 

2.1. Proofs of Propositions 1.2, 1.3, 1-4, 1.6, and 1.16 

Proof of Proposition 1.2. To begin, note that 

i/}' t (x) = 2(tAx) (2.1) 

for all x £ [0,oo) and t € (0, oo). Take any / S Pi.2- Then, by (1.1) and the 
right continuity of the monotonic right derivative /" of /', the relation (1.8) 
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defines a nonnegativc Borel measure 7 = 7/ on (0, 00] and, by Fubini's theorem, 
/'(»)=/ f"(u)du = 2 du j(dt) = 2 l{dt) du 

JO JO J(u,oo] J(0,oo] Jo 

= 2 [ (tAx)-f(dt) (2.2) 

for all x £ [0,oo). In particular, this proves part (III) of the proposition and 
(taken with x = 1) implies the condition J^ Q ^(t A l)j(dt) < 00 in part (I) of 
the proposition. Further, for all x £ [0, 00) (2.2) yields 

px pX P P px 

/0)=/ f'(u)du = 2 du (tAu)j(dt) = 2 l{dt) I (tAu)du, 

Jo Jo J{0,oo] J(0,oo] Jo 

which implies (1.6), since (tAu) du = \ ipt(x) for all x £ [0, 00) and t £ (0, 00]. 
This proves the "only if" implication in part (I) of the proposition, since the 
functions / and tpt are even. 

To prove the "if" implication, assume that (1.6) holds for some nonnegative 
Borel measure 7 on (0, 00] such that f, Q , (t A l)"f( dt) < 00 and for all x £ R. 
In view of (2.1), the condition J, Q ,(tA l)7(dt) < 00 implies that the integral 
J(o 00] ^'ti^li dt) converges uniformly over all x in any given compact subset of 
the interval (0, 00). So, one finds that (1.6) implies (1.9), which in turn implies 
that /' is nondecreasing and concave on [0, 00) (because the function ip t is so, 
for each t £ (0,oo)). It is also easy to see that / £ C 1 (E), /(0) = 0, and / is 
even. Thus, it is checked that / £ JF\^., which completes the proof of the "if" 
implication in part (I) of the proposition. 

It remains to prove part (II). Take indeed any / £ J-\^- Take also any non- 
negative Borel measure 7 on (0, 00] such that L ,(tA l)7(di) < 00 and (1.6) 
holds for all x £ M. We have to show that (1.8) takes place for all x £ (0, 00). 
Take indeed any such x. Then, as has been shown, one has identities (1.9). 
Therefore, for any h £ (0,oo) 

1 f'(x + h)-f'(x) f 

7 =/ r t {x,h)j(dt), (2.3) 

1 n J (0,00] 

where r t (x,h) :— t- \((x + h) At) — (iAt)] , which is bounded (between and 1) 
and converges to l{t > x} as h I 0. So, (1.8) follows from (2.3) by dominated 
convergence. This completes the proof of part (II) of the proposition as well. □ 

Proof of Proposition 1.3. Part (ii) of the proposition is obvious on recalling that 
Xj = q 2J — 1 for j £ 1, 00. Note also that p((xj + I) 4 / 3 — l) = | for j £ 1, 00. 
So, to prove then part (iii), it is enough to show that p e s (r) decreases from | to 
I and then increases back to | as r increases from 1 to | and then to 2, which 
follows because the expressions 2 — ^ and 1 + ^ are, respectively, increasing 
and decreasing in r £ [1,2], and they are equal to each other at r = |. 
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It remains to prove part (i) of the proposition, which is equivalent to 

/ait(x)=.^ rffM+o(1) (2.4) 

as x — > oo, where r :— p(x) e (1,2], so that x — q r2,J — 1. In other words, 
it suffices to prove that the convergence (2.4) with x = q r2 ' — 1 takes place 
uniformly in r £ (1, 2] as j — > oo. Assume indeed that j — > oo and x = q r21 — 1. 
Introduce yj := Xj + 1, so that yj = q 2 ' for j = 1,2,.... Then x = yj + ° , and 
uniformly over all k G {0, . . . , j— 1} one has x— \{xk+Xk+i) = x 1+ °^; moreover, 
if at that k — > oo then Xk+i — Xk = x)^^ = Uk + ■> which shows that the fcth 
summand in the sum J2k=o ■ • • m (1-10) is (xyf. 2 ' 3 } 1 +°( 1 ) — {y r j as 
k -> oo. So, the sum ECo ■ • ■ in ( L10 ) is (l£»J-i) 1+o(1) = (l£ Z/f /3 ) 1+o(1) = 

r+|+o(l) 

; 

To estimate the difference x — Xj, which appears on the right-hand side of 
(1.10), we need to distinguish two possible cases: r € [1, §) and r S [f,2]. 
Uniformly over all re [§,2] one has x — Xj = = y^ + \ so that the term 

on the right-hand side of (1.10) before the sum Xito ■ ■ ■ ls Vj 3+°( 1 )^ wn j cn 

• f , \ 2r-|+o(l) r+|+o(l) (2r-|)V(r+|)+o(l) rp oft (r)+o(l) 

yields / a it(a;) = ^ 3 y ' + y- 3 =2/} 3 3 = y/ = 
a; Peff(r)+o(i) j as in (2.4). 

It remains to consider the values re [1, |). For such values of r, the relation 
x — Xj = x 1+ °^ no longer holds; for instance, x — Xj = if r = 1. However, 
in this case one can obviously write x — Xj x and also p e s{r) = 1 + 
^ > 2 — ^. So, the term on the right-hand side of (1.10) before the sum 

Ei=o ■ • ■ is ^ 2/ J 2r " 3+ ° (1) ^ 2/j + 3+ ° (1) , whereas still Yjj^o ' ' ' = y J r+ ' + ° (1) ; so, 

r+|+o(l) . , A r+f+o(l) . r+|+o(l) , t f \ r +§+o(l) 

Vj 3 ^ /alt (a) < 2/j + J/j , whence / a it(z) = 2/,- 3 = 

y rp ott (r)+o(i) = a .p sH (r)+o(i) ) thus proving (2.4) uniformly over all r € [1, |) as 
well. □ 

Proof of Proposition 1.4- 

(i) Since the function / is nonzero, the set supp7 is a nonempty subset 
of (0, oo]. So, Sf = infsupp7 <G [0, oo]. If Sf = oo then supp7 = {oo}, which 
implies, in view of (1.6), that / = ipoo, which contradicts the assumption on / 
in Proposition 1.4. This proves part (i) of the proposition. 

(ii) Take any s £ (0, s/] and t € supp7, so that t <G [s/, oo]. Then s/ > 
and it is straightforward to check that L^ s {x) = Tpt{s) for any x € (0, s). Hence, 
by (1.6) and (1.9), 



Lf, s {x) = / L^ ; ,(x) 7 (dt) = / Ms)l(dt) = /(a), 

J(0,oo] J(0,oo] 

which proves part (ii) of Proposition 1.4. 



imsart-generic ver. 2009/05/21 file: arxiv.tex date: September 1, 2010 



losif Pinelis/von Bahr-Esseen inequality 



19 



(iii) Take any s G (sf, oo). Then L^ f . s (0+) = 2(s - i)+ for any < G (0,oo]. 
So, by (1.9) and (1.8), 



L' f . s (0+) = / L; t;s (0+) 7 ( d*) = 2 / (s - f) +7 ( dt) > 0, 

J(0,oo] J(0,oo] 

since for any s £ (s/,oo) one has 7((0, s)) > 0. Similarly, 



L' f -A*-) = / L^. s (s-) 7 (dt) = -2 tl{t< s}j(dt) < 0. 

J(0,oo] J(0,oo] 

This proves part (iii) of Proposition 1.4. 

(iv) In view of the rescaling identity Lf. s {x) = £/ a; i(§) with f s (u) := 
f(su), without loss of generality (w.l.o.g.) s = 1. Then part (iv) of the proposi- 
tion follows by parts (ii) and (iii) and the observation that £f(z) := Lf-i(l — ^/z) 
is concave in z € (0, 1). In view of (1-6), it is enough to prove this observation 
for / = ipt with t £ (0,oo]; at that, by part (ii) of Proposition 1.4 and because 
s^ t = t, w.l.o.g. let us assume that < t < s = 1. Observe that the second 
derivative £'^ t (z) in z admits of a piecewise-algebraic expression, which may be 
quickly obtained by using the Mathematica command PiecewiseExpand. Ap- 
plying then a Reduce command, one finds that ^ (z) ^ for all t G (0, 1) and 
z G (0, 1). Now part (iv) of Proposition 1.4 follows. 

(v) Part (v) of the proposition follows by parts (i)-(iv), on recalling (1.3) 
and taking into account that Lf [S (0+) = f(s), for all s G (0, oo). 

Proposition 1.4 is now completely proved. □ 

Proof of Proposition 1.6. Take any t £ (0, oo]. That G\, x = 1 follows im- 
mediately by (1.3). So, w.l.o.g. t G (0,oo), and then, by (1.3) and homo- 
geneity, w.l.o.g. t = 1. Thus, it remains to show that = 2. Take any 
s G (s^jjOo) = (1, oo) and observe that £^ rs (l) = — 2(s A 2) < 0, whereas 
L V- i;s ( 1_ ) = ~ 2 ( s A 2) + 2s ^ 0. Therefore, by part (iv) of Proposition 1.4, 
max Ig ( ^ L,f, 1 - S (x) = L^ i;s (l) = s 2 — (s — 2)5j_. Now, using part (v) of Proposi- 

s 2 — (s — 2) 2 s 2 — (s— 2) 2 

tion 1.4, it is easy to see that C^ n = sup s6(loo) s2 _ {s _ 1 - ) t = lim s ^oo s 2_ (s _ 1 - ) t = 
2. + □ 

Proof of Proposition 1.16. Take any p £ [1,2). It suffices to show that 

(3(p) := (1 - D(p))2 2 -P <1. (2.5) 

Observe that 

p'{p) = -2 2 ~p In 2 + (f ) 2 ~p [ 2 (sin f )(ln f - (lnr)'(p)) - tt cos f ] 
> -2 2 ~ p ln2 > -2 In 2 > -1.4; 

the first inequality here follows because cos ^ ^ 0, sin ^ > 0, and In ^ — 
(lnT)'(p) ^ In ^ — (lnr)'(2) > 0, taking into account that lnT is convex and 
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hence (lnT)' is increasing. It is easy to see that max{/3(l + |): i G 1,2} < 
1-0.49. So, < £(l+i) + (1.4)A < l-0.49+(1.4)| < 1 forp G [1+^, 1+f] 
and i G 1,2; thus, (2.5) holds for all p G [1, §]. 
Next, 

y9a(p) := 25^/3"(p) 2 p - x = A + + E 2 + E 3 + E 4 ), 

where 

A := 50tt In 2 2, 5 := 169 r(p) 
£i :=4tt(cos^) lnf , £ 2 :=Ksin^, 
£ 3 := -4((lnr)» 2 + (lnr)"(p)) sin f , 
£ 4 := (lnr)'(p)( -4ttcos^ + 8 In f sin^), 

and k := 7T 2 - 4 In 2 2 - 4 In 2 ^ - 8 In 2 In ^ < 0, whence £ 2 < 0. Also, £3 < 0, 
because (lnT)" > 0. Let us next bound E\ and £4 from above, assuming that 
pG [|,2]. Then £1 47r(cos(7r|) In f < -14.6; also, (lnT)'(p) > (lnr)'(f) > 
and (lnr)'(p) (lnr)'(2), so that £ 4 < (lnT)'(2) (4tt + 8 In f ) < 10.9. Thus, 
for allpG [|,2] 

/3 2 (p) < 50tt In 2 2 + 169 (^) 2 (-14.6 + 10.9) < -6 < 

and hence /3"(p) < 0, so that /? is strictly concave on [f,2]. At that, /3(2) = 1 
and /3'(2) = 1 - In 2 > 0; so, (2.5) holds for all p G [|, 2) as well. □ 

2.2. Proof of Proposition 1.8 

Of the 5 parts of the proposition, the most difficult to prove are parts (iii) 
and (v), which are based to a certain extent on several lemmas. To state these 
lemmas, we need more notation. Recall the definition of £(p,x) in (1.13) and 
introduce 

d d 
£ p {p 7 x) := —£(p,x), t x (p,x) := —£(p,x), 

d d 2 
£x,x(p,x) := —£ x (p,x) = -^£(p,x) 

and also 

p* x := ±(25x + 2) and x* p := ± (2p - 1), 
so that x = x* •<=>■ p = p*. Now we are ready to state the lemmas: 
Lemma 2.1. For all p G (1, 2) and a; G (0, one /»zs £ x ,x(p, x) < and hence 

Lemma 2.2. For a/Z p G (1, 2), 

B(p) := 4(p - If" 1 - (6 - p)*- 1 > 0. (2.6) 



imsart-generic ver. 2009/05/21 file: arxiv.tex date: September 1, 2010 



losif Pinelis/von Bahr-Esseen inequality 



21 



Lemma 2.3. For all p G (1,2) and x G (0,^) such that x ^ x*, one has 
£ x (p,x)<0. 

Lemma 2.4. For all p G (1,2) and x G (0, ^) suc/i i/iai a; < a:*, one ftas 
■£ p (p,x) < 0. 

The proofs of these lemmas are deferred to the end of this subsection. Let us 
now consider the four parts of Proposition 1.8. 

(i,ii) Take any p G (1,2). Observe that £ x (p,^) = 2 1 ~P((p - l)?" 1 - 
(3 - p) p_1 )p < 0, since p - 1 < 3 - p. On the other hand, £ x (p, £=±) = 
h 1 ^ p pB(p) > 0, by Lemma 2.2. So, any value of Xf- S as in part (iv) of Proposi- 
tion 1.4 (for / = | • \p) must be in the interval (2f^, ^-) C (0, 1). By Lemma 2.1 
and part (hi) of Proposition 1.4 (with Sf = 0), £ x (j>,x) is strictly decreasing in 
a; G (0, i) from a positive value to a negative one. Now, in view of part (v) 
of Proposition 1.4, parts (i) and (ii) of Proposition 1.8 follow, taking also into 
account that the equation (1.14) is equivalent to £ x (p,x) = 0. 

(iii) By part (i) of Proposition 1.8, x v is the only root x G (0, |) of the equa- 
tion £ x (p, x) = 0, for each p G (1, 2). So, by Lemma 2.1 and the implicit function 
theorem, C p is differentiable, and even real-analytic, and hence continuous in 

pe (1,2). 

Next, by Lemma 2.3, for any p G (1, 2) and x G (0, ^) the equality £ x (p, x) = 
implies x < x*, which in turn implies £ p (p,x) < 0, by Lemma 2.4. So, for any 
P G (1,2) one has £ p (p,x p ) < 0, whence -§^C P = -^£(p,x p ) = £ p (p,x p ) + 
£ x (p, Xp)-§p-x p = £ p (p, x p ) < 0, which verifies that C p is decreasing in p G (1, 2). 

Thus, to complete the proof of part (iii) of the proposition, it remains to show 
that C*i+ = 2 and C2- — 1 (recall that C2 = 1, by (1.12)). Here, consider first the 
case p I 1. Observe that then £{p-l,p) = (2 -p) p - (p- l) p +p(p- l)?' 1 -> 2; 
on the other hand, by (1.5), C p ^ 2 for all p G (1,2]. It indeed follows that 
<7i+ = 2. Next, for all x G (0, 1) andp G (§, 2), one has £(2, x) = 1 and \x p lnx\ < 
\x p ~ 1 lnx\ < | jj 1 / 2 In irr| < - < 1, whence \£ p (p, x)\ = \x p ~ 1 +px p ~ 1 lnx—x p \nx+ 
(1 - x) p ln(l -x)\ < |a;P £l | + \px p ~ 1 \nx\ + \x p lnx\ + |(1 - a:) p ln(l - a:)] < 
1 + 2 + 1 + 1 = 5; so, letting p t 2, one has ^(p, a;) = ^(2, a;) - J p 2 £ p (r, x) dr < 

1 + 5(2 — p) — > 1, whence limsup p ^ 2 C p = lim sup p ^ 2 ^(Pj ^p) *S !■ ^ remains to 
refer, again, to (1.5). 

(iv) The proof of part (iv) of the proposition is straightforward. 

(v) The equalities C\ + = Wi+ and C% = C2- = W2- = W%, and the similar 
equalities for the upper and lower bounds Cp' 1 , C~' 2 , C p :1 , and Cp' 2 on C p 
follow immediately by part (iii) of the proposition. Take now any p G (1,2). 
Consider £{p,z) := £{jp,l — y/z), where z G (0,1). By parts (i) and (ii) of 
Proposition 1.8, 

C p = max £{p, z) = max £{p,z), 

26(0,1) ze(z!,z 2 ) 

where z\ := Z\{p) := (^j^) 2 and z 2 := z 2 (p) := (^ip) 2 (since the values 
and of a; correspond, respectively, to the values z\ and Z2 of z under the 
correspondence given by the formula x = 1 — \/z.) Hence, C p > £(p, z{) V 
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l(p, Z2) — Cp' 1 V Cp' 2 , which proves the first inequality in (1.15). It follows 
from the proof of part (iv) of Proposition 1.4 that £(p, z) is concave in z £ (0, 1). 
Also, in the proof of parts (i) and (ii) of the proposition it was observed that 
i x (p, ^ip) > > £ x (p, which is equivalent to £ z (p,z 2 ) < < £ z (p,zi), 

where t z := §§. Therefore, l(jp,z) ^ £(p,Zx) + t z (jp,Zi)(z - z ± ) < l(p, Zx) + 
i z (p,z 1 )(z 2 - z x ) = C+- 1 and t(p,z) < £(p,z 2 ) + £ z (p,z 2 )(z - z 2 ) < l(p,z 2 ) + 
£ z (p, z 2 )(zi — z 2 ) = C+' 2 for all z £ (zi,z 2 ), which yields the second inequality 
in (1.15). The third inequality in (1.15) is trivial. 

So, it remains to prove the last inequality in (1.15). It is enough to show that 
p(p) < 0, where 

pip) :=2x5 p (C+' 2 ~2 2 - p ) 

= A{p) + l^p(p-l)B(jp), 
A(p) 10p(p - If" 1 - 2(p - l) p - 2 3 - p 5 p + 2(6 - p) p , 

and B{p) is as in (2.6). Observe next that 27 — 7p ^ ||(6 — p) 2 . Hence and in 
view of Lemma 2.2, 

4p(p) ^ p(p) := AAip) + ^(6-p) P ip-l)Bip); 

thus, it suffices to show that pip) < 0, which can be rewritten as p(r) < for 
re (0, |), where 

p(r) : =16(|) 1+ l^(l + |r). 

One has 

Pl is) := P'(r)il±^! = A^s) + 4Bx(s)s^ , 

where 

A^s) := 16(-62 + 2202s + 1160s 2 + 121s 3 ) + 80(40 + 382s + 105s 2 + 8s 3 ) In 
Bxis) := 1572 - 367s - 795s 2 - 81s 3 + (-1310s + 75s 2 + 160s 3 ) In 

and s := | — 1, so that r = j^, and r G (0, |) iff s > 4. Using a Reduce 
command, one finds that Bxis) switches in sign from — to + as s increases from 
4 to co, and the switch occurs at a certain point s* = 31.4 ... . With 

- M ._ Pi(s) _ Axjs) 

P^ S >- s 5/(l+ S ) Sl ( s ) s 5/(l+ S ) Bl ( s ) + 4 ' 

another Reduce command shows (in about 12 sec) that 

p 2 is) :=p[i s )Bxis) 2 s^^^f 

switches in sign from + to — to + to — as s increases from 4 to oo, and the 
switches occur at certain points si = 5.2 . . . , s 2 = 21.5 . . . , and S3 = 42.7 .... 
So, pxis) switches from increase to decrease to increase as s increases from 4 
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to si = 5.2 ... to S2 — 21.5 ... to s, = 31.4 . . . , and then p\{s) switches from 
increase to decrease as s increases from s* = 31.4 ... to S3 = 42.7 ... to 00. Next, 
Pi(s) < for s £ {4, si, S2, S3}; also, pi(s*) < 0, whence pi(s« — ) = 00 > and 
p~i(s*+) = —00 < (on recalling the definitions of pi(s) and s*). It follows that 
pi(s) switches in sign from — to + as s increases from 4 to s,, and pi < on 
(s*,oo). Therefore, pi(s) switches in sign from + to — as s increases from 4 to 
00. Equivalcntly, p'(r) switches in sign from — to + as r increases from to |. 
This implies that p(r) switches from decrease to increase as r increases from 
to |. Equivalcntly, (|) p p(p) switches from decrease to increase as p increases 
from 1 to 2. Note also that ,5(1+) = ,5(2-) = p(2) = 0. So, indeed p(p) < 0, 
for all p £ (1,2). This proves part (v) and thus the entire proposition, modulo 
Lemmas 2.1-2.4. 

Proof of Lemma 2.1. Introduce the new variable y := — — , so that y > 1 for 
x £ (0, \). Then, for any p £ (1, 2) and x £ (0, |), 

4,«(p, x) {l ~ X)2 ' 3 = 1 - (2 - P )y 3 -v - (3 - p)y 2 "f 
P(P- 1) 

<l-(2-p)-(3-p)=2(p-2)<0, 
which proves the lemma. □ 

Proof of Lemma 2.2. Take indeed any p £ (1, 2). Note that (2.6) is equivalent 
to B(p) := In (4{p — l) p_1 ) - In ((6 - p)^ 1 ) > 0. Next, B'(p) = 1 + r + lnr, 
where r := §+^, so that B'{p) is increasing in p, and B'(2) < 0, which implies 

that -B'(p) < and hence B(p) is decreasing in p, with B(2) = 0. Thus, indeed 
B(p) > 0. □ 

Proof of Lemma 2.3. Throughout the proof, it is assumed that indeed p £ (1,2) 
and x £ (0, 5). Let 

so that equals 4 in sign. Then ^(D x t)(p,x) = (p-2)(p-l)(l-x)-Px p - 3 < 
0, so that (D x £)(p,x) decreases in x. Consider now 

H(p) := (D x £)(p, x* p ) = (27 - Apf-^Ap - 2f ~ 2 (21p - 23) - 1. 

Obviously, H(jp) < for p < ||. Let us show that i/(p) < for p £ (|f , 2) as 
well. Observe that 

^ 4(27-4p)P- 1 (2p~l) 2 (4p-2)-P 

ff(p) 21^23 

25 (42p 2 -92p + 73) 4p - 2 

= Hi (p) := ^—7 — r + m . 

(27-4p)(2p- l)(21p-23) 27 - Ap 

Using the Mathematica command Minimize, one finds that Hi (p) > and hence 
H'(p) > for p £ (|f, 2]. Since H{2) = 0, it indeed follows that H(p) < for 



imsart-generic ver. 2009/05/21 file: arxlv.tex date: September 1, 2010 



losif Pinelis/von Bahr-Esseen inequality 



21 



p G (|f, 2) and thus for allp G (1, 2). So, one has (D x £)(p, x*) < 0. Recalling that 
(D x £)(p,x) decreases in x, one has (D x £)(p,x) < or, equivalently, £ x (p,x) < 
— provided that x ^ x*. □ 

Proof of Lemma 2.4- Throughout the proof, it is assumed that indeed p6 (1,2) 
and x G (0, \). Let 

<n £)<n r) - lp{p > X) - xP ' 1 ( l + (P~ x ) ln3 _ I 

{ u p i)[p,x). _( 1 _ x y ]n ( 1 _ x - ) -(l- x )P\n(l-x) ' 

(DDiVv x) - gWKgif) ln ^-") 

{ u p u p i m x).- gp ^ ^ , 

so that Dpi and D p D p £ equal £ p and d ^ p t> in sign, respectively. Then 

■^(D p D p £)(p,x) = ln(l — a;) - lnx > (since a; € (0, i)), so that (D p D p i)(p,x) 
increases in p. Consider now 

_. „. . , [4+(21x + 2)lna?]In(l-x)-[8 + (21x + 2)lnx]hia! 
{D p D p £){p x ,x) = 



4 In a: 

Observe that 1 < p* x < 2 ^ < x < and then use the Mathematica 

command Reduce to find that (D p D p £)(p x , x) > provided that ^ < x < 
Similarly, (D p D p l) (1, a;) > provided that < x < ^. Thus, (D p D p £)(l V 
> for all x G (0, ^). Recalling that (D p D p £)(p,x) increases in p, one 
has (D p Dp£)(p, x) > for all p G [1 Vp* , 2). It follows that (D p £)(p, x) increases 
in p G [1 V p*,2). Now use Reduce to check that (D p £)(2, x) < 0, which yields 
(D p £)(p,x) < or, equivalently, £ p (p,x) < for p G [1 Vp*,2) or, equivalently, 
for x ^ x*. □ 



2.5. Proofs of Corollary 1.9 and Propositions 1.10 and 1.11 

First in this subsection we shall prove Proposition 1.11, then Proposition 1.10, 
and finally Corollary 1.9. 

Proof of Proposition 1.11. The convexity of t//(c, s, a) in a G K follows immedi- 
ately from that of /. Since /' is strictly positive and nondecreasing on (0, oo), it 
follows that /(oo— ) = oo; similarly (or because / is even), /(— oo+) = oo. 
So, Uf(c,s,a) — > oo as \a\ — > oo. Therefore and by continuity, there is a 
minimizer of Uf(c,s,a) in a G R. Take any such minimizcr, say a*. Since 
/ G C 1 (M), the partial derivative of Uf(c,s,a) in a at a = a* is 0; that is, 
cf'(s — c + a*) + (s — c)/'(a* — c) = 0, which can be rewritten as 

cf'(s-c+a*) = (s-c)f'(c-a*), (2.7) 

since / is even and hence /' is odd. Recall also that /' is strictly positive and 
hence nowhere zero on (0, oo). It follows that the arguments s — c + a* and c — a* 
of /' in (2.7) must be of the same sign; noting that the sum of these arguments is 
s > 0, one concludes that they must be both positive; equivalently, a* G (c— s, c). 
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Moreover, /' is positive and nondccrcasing on (0, oo) and < c < s — c, so that 
(2.7) yields f'(s — c + a*) > f'(c — a*) and hence 

s - c + a* > c - a*. (2.8) 

If a minimizcr of Uf(c, s, a) in a is not unique, then the first two partial 
derivatives of Ut{c,s,a) in a are identically zero for all a in some nonempty 
open interval (a±, 02) C (c — s, c). That is, cf'(s — c + a) = (s — c)f'(c — a) and 
c/"(s — c + a) + (s — c)f"(a — c) = for all a G (ai, 02). Since /" is nonnegativc 
and even, it follows that f"(c — a) = f"{a — c) = for all a G (01,02), so that 
/" = on the interval (c — 02, c — a\). Because a-i ^ c and /" is nonnegative and 
nonincreasing on (0, 00), one has /" = on the interval (c — 02, 00), so that /' is 
constant on the same interval. On recalling (2.8), one has s~c+a > c—a > c—ci2 
for any a G (01, 0,2), which shows that f'(s — c + a) = f'(c — a); however, this 
contradicts the previously obtained inequality f(s — c + a*) > /'(c — a*) for 
any minimizcr a*. 

Next, the formula (1.23) for the unique minimizcr of U^ t (c, s, a) in a is 
easy to verify by noting that the partial derivative of U^, t (c, s, a) in a at a = 
(s — c — i) + is 0. Moreover, for any real c an t such that c > t > one has 



U^,,(c, s, 0) t f 
7- — > 2—7^, and then 2 — — ► 2, which shows that kw,. 



It remains to prove that the unique minimizer a = a,f- c ,s is nonnegative. 
Equivalently, it remains to show that the partial derivative of Uf(c, s, a) in a is 
no greater than at a = 0, that is, 

cf(s-c)Z(s-c)f'(c). (2.9) 

By the linearity relation (1.9) and homogeneity, w.l.o.g. / = ip t for some t € 
(0, 00), in which case (2.9) is equivalent to a,^ t - Cl s ^ 0> and that is obvious from 
(1.23). □ 

Proof of Proposition 1.10. Take indeed any / <G \ {0}. By e.g. [52, Propo- 
sition 3.18], any zero-mean probability distribution on M \ {0} is a mixture of 
zero-mean probability distributions on 2-point sets. Therefore, w.l.o.g. the zero- 
mean r.v. X takes on only two values, so that X = X C} d, where c and d are 
positive real numbers, and X c ( i is a r.v. such that P(X c ( j = — c) = and 
P{X c ( i = d) = ^rg. Take now any c and s such that < c < s < 00, and 
introduce 

U f (c, s ,0) E/(X e , s _ c ) 
R f {c, s, a) := -f- = — — ■ — -. (2.10) 

Uf{c,s,a) E/(X CjS _ c + a) 

So, the best constant k in (1.22) is given by a formula similar to (1.18), but 
with the restrictions c G (0, s) and a G R instead of c G (0, |) and a G (0,c). 
That c G (0,s) can be reduced to c G (0, |) follows by the symmetry relation 
Rf(c,s,a) = Rf(s — c, s, — a) and the continuity of Rt(c,s,a) in c. Finally, 
the condition a G R can be reduced to a G (0, c) by Proposition 1.11 and the 
continuity of Rf(c, s, a) in a. □ 
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Proof of Corollary 1.9. 

(I) Take indeed any / <G IF\ 2 \ {0}. Consider the martingale expansion 

Y = Ey + & + ■■•+£„ 

of Y with the martingale-differences 

Ci ■= Ej Y - Ej_! y (2.11) 

for i £ 1, n, where stands for the conditional expectation given the a- algebra 
generated by (X±, . . . ,Xi), with Eo := E. For each i 6 l,n introduce the r.v. 
f]i := Ej(y - Yi), where := g(X\, . . . ,Xi-i,Xi,X i+ x, . . . ,X n ); then, in view 
of (1.16) or (1.26), |?7i| ^ pi(Xi,Xi); because f(u) is increasing in \u\, it follows 
that f(r}i) < f(pi(Xi,Xi)) and hence Ef(r]i) < E/(pi(Jf<,Xj)); also, & = rfc - 
Ei-ir/i, since the r.v.'s Xi,...,X n are independent. Now (1.17) follows from 
Theorem 1.1 and Proposition 1.10, which latter yields Ej_i /(£») ^ «;/ Ej_i /(r/j) 
and hence E /(&) < «/ E /(r/j). 

To check the inclusion ft/ € [1,2] in (1.18), note first that the inequality 
Kf ^ 1 follows by the continuity of Uf(c, s, a) in a, at a = 0. As for the inequality 
Kf ^ 2, it can be rewritten as 

?7/(c,s,0) < 2U f (c,s,a) (2.12) 

for all s £ (0,oo), c £ (0, |), and a £ (0, c), where w.l.o.g. f = ipt (f° r some 
i <E (0,oo), by (1.19) and (1.6)) and s = 1 (by homogeneity). Take then indeed 
any c £ (0, ^) and a £ (0, c). By Proposition 1.11, w.l.o.g. a = a^ t - Ct i- Using a 
Simplify Mathematica command for U^, t (c, 1, a^, t:c .i) and then following with a 
Reduce, one quickly verifies that (2.12) indeed holds for / = %pf This completes 
the proof of part (I) of Corollary 1.9. 

(II) To obtain the expression in (1.21) for k p = k\.\p, note first that, by 
homogeneity of the power function / = | ■ | p , w.l.o.g. s = 1. Then solve the 
equation (2.7) to find the unique minimizer 



l|.|p; 



-1/(P-1) 

(2.13) 



c V(p-i) + (l-c)V(p-i) 



of U p (c,a) := U\.\p(c,l,a) in a. Finally, substitute this minimizer for a in 
i? p (c, a) := ^ c '°j and simplify, to show that f c (p) := R p (c,a p . c ) equals the 

expression under the max sign in (1.21). 

The continuity of k p in p follows because f c (p) is continuous in p £ (1,2] 
uniformly in c £ [0, ^] (indeed, the derivative, f' c (p), of f c (p) in p is bounded 
over all c £ [0, ^] and all p in any compact subinterval of (1,2]). That k% = 1 
is trivial. To check that ki + = 2, observe that R p {p - l,p) -> 2 as p j 1 and 
recall that Kf ^ 2 for all / € J 7 ! ^ \ {0}. The statements that the values of k p 

are algebraic for all rational p £ (1,2] and £3/2 = \ y51 + 21\/7 = 1.14..., 

corresponding to c = | (3 — yl + 2\/7) = 0.081 . . . , are straightforward to 
check. 
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It remains to prove that k p strictly decreases in p £ (1,2]. To accomplish 
this, it is enough to show that f c (p) does so for each c <E (0, |), since r (p) = 
^1/2 (p) = 1 f° r all P € (1,2] and f c (2) = 1 for all c e [0, |]. Take indeed any 
p € (1,2) and c € (0, |) and observe that (lnf c )'(p) = n + r 2 — ^rj-?"3, where 

_ c p - 1 lnc+(l-c)P- 1 ln(l-c) 
n cP- 1 + (1 - c)p-! ' 

r 2 :=ln(c 1 /( p - 1 ) + (l-c) 1 /( p - 1 )), 

c V(p-1) In c + (1 - c) 1 /^- 1 ) ln(l - c) 



r 3 := 



c V(p-i) + (1 - c )i/(p-i) 



Note that n + r 2 — jzrt r 3 = Ri + R 2 , where Ri := n — r 3 and i? 2 := ?"2 + (1 — 
^j-)r3. Observe that 

i?i = A-^j < 0, (2.14) 

( C^=T + (1 - C)F^T J (cP" 1 + (1 ~ c)P- r ) 



since i— 2 



> 1 and p- 1 < 1 < 
It remains to show that i? 2 < 0. Consider the new variable b := 



c i/(p-i) + (i_ c )i/( P -i) , 



so that b € (0, |) and c — frp-i_|?(i_j,)p-i ■ Then one can check that 

R 2 = h{b) := (p-2)(61n6+(l-6)ln(l-6)) - In (y -1 + (1 - (2.15) 
and 

/ l "(6)6 2 -P(l - 6) 2 -p(6p- 1 + (1 - 6)p-!) 2 = h 21 (b)h 22 {b), (2.16) 



where 



with /i' 21 (6) = (p - 2)(p - l)(j^) P (l - 6)- 3 < and ^ 22 (6) = (p - 2) 
x (p — l)(j3j) P 6 -3 < 0, so that both h 2 i(b) and h 22 (b) are decreasing in b. 
Since ft 2 i(i) = 2(2 - p) > 0, it follows that /i 21 > on (0, ±). So, /i"(6) equals 
/122(b) in sign. Since h 22 (0+) = 00 > and /i2a(|) = 2 (p - 2) < 0, both /122(b) 
and h"(b) switch from + to — as b increases from to \ . Therefore, h(b) switches 
from convexity to concavity in b € (0, |). At that, h(0+) = h{\) = h'{\) = 0. 
It follows that h < and hence R 2 < 0. This completes the proof of part (II) 
and thus that of the entire Corollary 1.9. □ 



2.4. Proof of Theorem 1.1 

This proofs proceeds in reductive steps. First, the theorem is reduced to the 
case n = 2, which is mainly treated by Lemma 2.5. In turn, Lemma 2.5 is 
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largely reduced to the technical Lemma 2.6, which provides a few other rounds 
of reduction, one of which treated in Sublemma 2.7. We shall state these two 
lemmas and the sublemma just where they are needed, postponing their proofs 
till later in this subsection. 

(I, II) By induction and conditioning, parts (I) and (II) of Theorem 1.1 
follow immediately from 

Lemma 2.5. 

(I) For any f € J~i,2 \ {0}, i£l, and zero-mean r.v. Y 

Ef(x + Y)^f(x) + C f Ef(Y). 

(II) For any f G Fl,2 \ {0}, one has the following: if a constant factor C is 
such that 

Ef(X + Y)^Ef(X) + CEf(Y) (2.17) 

for all independent zero-mean r.v. 's X and Y , then C ^ Cf. 

We shall turn to the proof of this lemma in a moment, after the proof of parts 
(III) and (IV) of Theorem 1.1 is completed. 

(III) Take any / € F ia \ {0}. The inequality C f ^ 1 follows by (1.3), since 
Lf- tS (x) — > f(s) as x I 0. On the other hand, in view of Proposition 1.6 and (1.3), 
one has L^, t - S (x) ^ 2tp t (s) for any t € (0, oo] and x, s such that < x < s < oo; 
so, (1.6) implies Lf- S (x) ^ 2/(s), whence, by (1.3), Cf ^ 2. 

(IV) Part (IV) of Theorem 1.1 follows immediately from Propositions 1.6 
and 1.8. 

Thus, Theorem 1.1 is proved, modulo Lemma 2.5. 

Proof of Lemma 2. 5. 

(I) An argument as in the proof of Proposition 1.10 shows that w.l.o.g. the 
zero-mean r.v. Y takes on only two values, so that Y = X c ^d, where c and d 
are positive real numbers, and X c _d is a r.v. such that P(X c ,d = — c) = 
and P{X c ( i = d) = Take now any c and s such that < c < s < oo, and 
introduce 

g f ., c , s (x):=Ef(x + X c , s - c )-f(x) and J f . CiS ( x ) ~ i^M. ( 2 .18) 

3/;c,s(U) 

the latter definition is correct, because / > on M \ {0} and hence <7/ ;CiS (0) = 
E f(X CyS ^ c ) > 0. Observe also that Jf- CyS (~x) = Jf- S ^ c , s (x). Thus, part (I) of 
Lemma 2.5 reduces to 

(?) 

Cf := sup { Jf- C ,s( x ) ■ < x < oo, < c < s < oo} ^ Cf. ( 2 -19) 

Accordingly, let us take any c and s such that < c < s < oo. 

Using integration by parts (or, more precisely, the Fubini theorem), one has 
the Taylor expansion f(x + k) = f(x) + kf'(x) + k 2 L(l - z)f"{x + kz) dz for 
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all real x and k, whence 

sg f . CtS (x) = cf(x + s - c) + (s - c)f(x - c) - sf(x) (2.20) 

= (s-c)c / (1 - z)[(s - c)f" (x + (s - c)z) +cf"(x-cz)]dz, 
Jo 

(2.21) 

which is nonincrcasing in x £ [s, oo), since /" is nonincrcasing on (0, oo). Hence, 
by (2.18), Jf- C ^ s (x) is nonincreasing in x £ [s,oo). It follows that the condition 
< x < oo in (2.19) can be replaced by < x < s. So, inequality (2.19) and 
thus part (I) of Lemma 2.5 follow by 

Lemma 2.6. For any f £ J~i.2 \ {0} and any x, c, s such that < x < s and 
< c < s < oo 

■W*) < (2-22) 

We shall turn to the proof of this lemma in a moment, after the proof of part 
(II) of Lemma 2.5 is completed. 

(II) To prove part (II) of Lemma 2.5, take indeed any / 6 T\^. \ {0}. Let 
c and s be as in the proof of part (I) of Lemma 2.5, so that < c < s < oo. 
Since /" is even on R and nonnegative and nonincreasing on (0, oo), the identity 
(2.21) implies that gf- c . s (u) converges to a hnite limit as u — > — oo, and then so 
does Jf ;c ,s(u). Let now a and b be any positive real numbers. Then 

E f(Xg ib + X C ^ C ) - E f(Xg ib ) b a 

^Jn? \ = — — TJf;cA- a )+— — 7 J f.cA b ) -zA J f;cA b )> 

L/(A CiS _ c ) a + b a + b a-»oo 

assuming that the r.v.'s X a j, and X c . s _ c are independent. So, the constant C in 
(2.17) cannot be less than J/ ;C;S (&), for any c, s, b such that < c < s < oo and 
< b < oo. That is, C must be no less than Cf, the left-hand side of (2.19). 
On the other hand, by 1'Hospital's rule, for any 

■W*) -* (2.23) 

cfa / (s) 

with Lf- S (x) as in (1.4) So, in view of (1.3), Cf ^ C/, and thus C ^ C/. So, 
Lemma 2.5 is proved, modulo Lemma 2.6. □ 

Proof of Lemma 2.6. Take indeed any / G J~i.2 \ {0} and any x, c, s such that 
< x < s and < c < s < oo. Let 7 := 7^ be the measure as in Proposition 1.2. 
Then, by (2.18), (2.20), (1.4), (1.6), and (1.9), inequality (2.22) can be rewritten 
as 

00 />oo (?) poo />oo 

\t(s;x,c)il) u (s)"f(dt)i(du) < / / /i t (s; c)m u (s; x)7( dt)7( du), 

Jo Jo 

(2.24) 
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where 

A t (s; x, c) := sg^ t - CiS (x) = cip t (x + s - c) + (s - c)tp t (x - c) - sip t (%), 
Ht(s;c) := A t (s;0,c) = cip t (s - c) + (s - c)tp t (c), 
u t (s;x) := L 4 , t . s (x) = ip t (x - s) - ip t (x) + stp' t (x). 

Clearly, the left-hand side of (2.24) will not change if \t(s; x, c)ip u (s) there is 
replaced by A M (s; x, c)ipt{s); a similar statement holds concerning the right-hand 
side of (2.24). Because of this symmetry, it is enough to prove that 

(?) 

A 4 (s; x, c)ip u (s) + A u (s; x, c)^t(s) fj, t (s; c)u u (s; x) + fi u (s; c)u t (s; x) (2.25) 

for all u and t in (0, oo) and all x, c, s such that < x < s and < c < s < oo. 
Using the homogeneity relations 

ip t (z) = s 2 ipt(2) and $(z) = s^(5), 

where < := t/s, z := zj 's, t £ (0, oo), and z <S R, one has s = 1 w.l.o.g., so that, 
with 

A t (x,c) :=X t (l;x,c), ^ t :=^ t (l), M t (c) := /x t (l; c), N t (x) := 
inequality (2.25) can be further rewritten as 

(?) 

A := A u . t (x, c) := A t (a;,c)* u + A u (i,c)* t - M t (c)N u (o;) - M u (c)N t (a;) ^ 0, 

(2.26) 

to be proved given the restrictions 

0<x<l, 0<c<l, 0<u<oo, < t < oo. (2.27) 

Note that A is picccwisc-polynomial in x, c, u, t, and the restrictions (2.27) 
on the variables x,c,u,t are linear (or, more exactly, afHne). So, as discussed 
in the beginning of Section 2, Mathematica commands such as Reduce can be 
used here. 

We shall make several observations in order to reduce the computational com- 
plexity of the problem (2.26)-(2.27). In particular, at the end of this subsection 
we shall prove 

Sublemma 2.7. For any c G (0, \], x G (0,1), and t G (0, oo) one has 

A t (x,c) < A t (x, 1 - c). 

Note also that M t (c) = M 4 (l - c). So, w.l.o.g. 

|<c<l. (2.28) 
Observe next that A = when c = 1. Hence, it suffices to show that 

OA (?) 

A c := -— (2.29) 
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for all x £ (0, 1), c £ (g, 1), tt £ (0, oo), and t £ (0, oo). 

Observe also that A c is expressed in terms of the values of the functions ip u , 
ip' u , iptj ip't °^ arguments z in the set Z := {z\, . . . , zj}, where (z\, . . . , 27) := 
(l,x,l + x — c,\x — c\,l — c,c,l — x). At that, one has one of two possible cases: 

x ^ c or x > c, 

on which the algebraic expression of \x — c\ depends. In each of these two cases, 
the expressions of z\, . . . , z-j in terms of x, c represent pairwise distinct afhne 
forms in x, c. So, for any distinct i and j in the set 1, 7, the set {(x, c) £ P: Zi = 
Zj} is nowhere dense in the nonempty open set P = (0, 1) x (|, 1). Therefore 
and by the continuity of ipt and ip' t , it is enough to prove inequality (2.29) for 
all (x, c, t, u) £ (0, 1) x (|, 1) x (0, 00) x (0, 00) such that Z{ 7^ Zj for any distinct 
i and j in the set 1, 7. Similarly, w.l.o.g. x 7^ c. Moreover, by symmetry, w.l.o.g. 

u < t. (2.30) 

For each z £ Z, the algebraic expressions for the values ip u (z), tp' u (z), ipt(z), 
ipt(z) depend on the signs of z — u and z — t, respectively. Therefore, the order 
in which the elements of set Z go (depending on the values of x and c) is of 
relevance. Overall, there are 7! = 5040 permulations a of the set {1, . . . , 7}. For- 
tunately, comparatively few of these permutations a are such that the ordering 
z cr(i) < z a(2) < ■ ■ ■ < z a (r) ma y be compatible with restrictions (2.27)-(2.28); 
namely, there are 10 such permutations with x < c and 2 such permutations with 
X > c (corresponding to the orderings x— c < 1—x < 1 — c < c < x < 1 < 1+x—c 
and 1 — x < x — c<l — c<c<x<l<l + a- — c),to the total of 12 permutations 
that may be compatible with the restrictions (2.27)-(2.28); let E denote the set 
of these 12 permutations, (it takes Mathcmatica about 11-12 sec in each of the 
two cases (x < c and x > c) to select the compatible permutations.) 

For each permutation a £ E, the value of t may fall into one (say the jth one) 
of the 8 intervals [z CT ( ), 2 CT (i)), ■ • ■ , [«<r(7)> ^(8)), wriere z CT (o) := and z ff(8 ) := 00; 
for each such j, the value of u (which is less than t, according to the additional 
restriction (2.30)) may fall either into the same jth interval or into any of the 
intervals to the left of it. So, for each permutation a £ E, there are |(8 x 
9) = 36 ways for t and u to fall into one or two of the 8 intervals. Overall, 
one has 12 x 36 = 432 cases to consider, (in fact, we disregard the restriction 
u < t in any of these 432 cases when both u and t fall into the same interval 
[^oYj), z a (j+u), to make the set of all pairs (u, t) simply a rectangle in R 2 of the 

form [Za(j)i^(j+i)) x [Mfe)>z<7(fc+i)), witn 3 < k ) 

Using Mathcmatica commands Simplify and Reduce as explained above, 

it turns out that the mentioned purely algebraic algorithm is too slow and/or 
RAM-consuming. A small dose of calculus helps greatly here: we set up a prelim- 
inary test to check whether A c is convex in u and/or in t, in each case of the 432 
ones; then it is enough to check that A c ^ when at least one of the variables 
u, t takes a value at an endpoint of the corresponding interval [z a (j) > z o(j+i))- 
(Since z a rg) was defined as 00, the right endpoint of the interval \z a u) , z^u^x)) 
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is oo if j = 7, and one may wonder as to what the value of A c is when t equals oo 
and at that u possibly equals oo as well. This problem is resolved by observing 
that A c is constant in u £ [2, oo) and in t £ [2, oo), which follows because, given 
the restrictions (2.27), one has Z C [0, 2), whereas ipt(z) = z 2 and ip' t (z) = 2z 
do not depend on t provided that z £ [0, 2) and t £ [2, oo).) 

With these preparations, most of the 432 cases can be processed rather 
quickly, each taking from a fraction of a second to a few seconds to (rarely) 
a few minutes. Overall, it takes just about 7 minutes on a standard Core 2 Duo 
laptop to process all of the 10 x 36 = 360 cases with x < c. 

However, in each of certain 13 cases of the 2 x 36 = 72 ones with x > c, 
it takes about 10 sec or more (or much more) for the corresponding Reduce 
command to finish; in each of these 13 more difficult cases, A c is convex neither 
in u nor in t. With the execution time limit set at 10 sec, Mathematica processes 
the 72 — 13 = 59 easier cases with x > c in about 3.5 minutes. 

To deal with the remaining 13 cases, we set up another test, based on the 
following elementary observation: 

(i) if a function h : [A, B] — s- R is such that h!" ^ 0, then max^^j h ^ 
max{/i(B), h(A),h(A) + h'(A)(B - A)} — so that, if the latter max is no 
greater than 0, then h ^ 0; 

(ii) if a function h : [A,B] — > R is such that h!" ^ 0, then max^.g] h ^ 
vaax.{h(A),h(B),h(B) + h'(B)(A - B)}. 

Each of the 13 remaining cases passes test (i) (with h := A c considered as a 
function of u), with the total execution time of about 4 minutes for all of the 
13 cases. This completes the proof of Lemma 2.6, modulo Sublemma 2.7. □ 

Proof of Sublemma 2. 7. This proof is somewhat similar to, but much simpler 
than, that of (2.29). Take indeed any 

c€(0,i], a: 6(0,1), ie(0,oo). (2.31) 

Let here 

5 := 5 t (x,c) := K t {x,c) - A t (a;, 1 - c). 

As in the proof of Lemma 2.6, w.l.o.g. t £ (0, 2]. Observe that S is expressed in 
terms of the values of the function ipt of arguments z in the set Z := {zi, . . . , Z4}, 
where (zi, . . . , 24) := (1 + x — c, x + c, \x—c\, \x + c— 1|). At that, one has w.l.o.g. 
one of four possible cases: 

(i) x < c & x + c < 1, (ii) x<c$zx + c>l, 
(iii) x>cSzx + c< 1, (iv) x<c&cx + c>l, 

on which the algebraic expressions of \x — c| and \x + c — 1| depend; because of 
the continuity of S, it does not matter whether the inequalities here are strict 
or not. Actually, case (ii) is impossible, given the restriction c £ (0, Of the 
4! = 24 permulations a of the set {1, . . . , 4}, there are 2 permutations that may 
be compatible with restrictions (2.31) in case (i), such permutations in case 
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(ii) , 3 of them in case (iii) , and 1 such permutation in case (iv) . Let S denote 
here the set of these 2 + + 3 + 1 = 6 permutations. For each permutation a € S, 
the value of t may fall into one of the 5 intervals [^(o), 2<r(i))i • • • j [ z <t(7)i z a(b))i 
where 2 CT ( ) : = and z a ^ := 2. Overall, there are 6 x 5 = 30 cases to consider. 
In each of these cases it is quickly checked by using the Reduce command that 
(5^0; this takes the total of about 1 sec of computer time. □ 

Now Theorem 1.1 is completely proved. 
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